Skip to content

Fix Qwen3-TTS streaming memory leak.#585

Merged
lucasnewman merged 1 commit intoBlaizzy:mainfrom
orbitalquark:fix-qwen3-tts-streaming-memory-leak
Mar 17, 2026
Merged

Fix Qwen3-TTS streaming memory leak.#585
lucasnewman merged 1 commit intoBlaizzy:mainfrom
orbitalquark:fix-qwen3-tts-streaming-memory-leak

Conversation

@orbitalquark
Copy link
Copy Markdown
Contributor

When running the mlx-server and sending stream-able "/v1/audio/speech" requests with the Qwen3-TTS model, memory usage would continue to grow by 3-4GB per request until being culled after about 10GB. Then the cycle would repeat.

The Qwen3-TTS streaming model does a good job keeping memory usage down during the stream, but it fails to do one final mx.clear_cache() after yielding the last streaming chunk.

This PR fixes the leak.

Copy link
Copy Markdown
Collaborator

@lucasnewman lucasnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lucasnewman lucasnewman merged commit 8083120 into Blaizzy:main Mar 17, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants